Instructions
In this mini-assignment, you will use Yelp + Census data to conduct the following tasks. This data is for Fulton and DeKalb counties. The category used for the API request was “restaurants”. This data is not cleaned in anyway.
Point of Interests
Point-of-interests (POI) are important for urban residents in various ways. POIs can add economic and social vitality to the neighborhood in which they locate. POIs that are close to residential locations can functions as destinations that people can potentially walk to, creating more walkable environment; in fact, Walk Score of a location is calculated based on how many/diverse POIs are found within the walkable distance from that location. These benefits are likely to be greater if POIs are attractive and popular.
However, attractive POIs may not be distributed evenly across different neighborhoods; we anecdotally know that attractive POIs are more likely to be found in more advantaged neighborhoods, and more advantaged neighborhoods may enjoy more benefits from having attractive POIs nearby than their counterparts.
Research Question
But is there really such relationships between the attractiveness of POIs in a neighborhood and being advantaged as a neighborhood? Which neighborhood characteristic has the strongest relationship with the attractiveness of POIs? This assignment aims to examine the relationship between being advantaged as a neighborhood and having more attractive POIs using ggplot package.
To complete this assignment, follow the directions below:
- Download the data prepared for this assignment from here.
This data was prepared using the following steps:
- Yelp data was downloaded for categories = ‘coffee’. This data covers Fulton, DeKalb, Clayton, Cobb, and Gwinnett counties.
- American Community Survey 5-Year Estimate for 2019 was downloaded
for the counties specified above. It contains
- median annual household income (hhincome)
- percent residents under poverty (pct_pov)
- percent residents who self-identify as white (pct_white)
- total population (race.tot).
- log-transformed version of median annual household income (hhincome_log)
- log-transformed version of percent residents under poverty (pct_pov_log)
- The two data are spatially joined. After joining, a few additional
columns were generated, including
- the number of businesses (yelp_n)
- average rating (avg_rating)
- number of reviews (review_count)
- log of the number of reviews (review_count_log)
- average price (avg_price)
- Using this data, re-create the following plots as closely as
possible. Make sure you provide the code you wrote to generate the plot.
When you re-create them, you DO NOT need to make plots aesthetics
similar. For example,
- When custom color is used, the choice of colors do not matter as long as you use appropriately use some custom colors of your choice to display the designated data;
- When opacity is used, the actual level of opacity doesn’t matter as long as a reasonable level of opacity is applied.
- Other minor aesthetics (e.g., aspect ratio of plots) do not matter. If you want to modify them for aesthetics, feel free to do so.
- For each of the plot, write a few sentences to describe your findings.
Plot 1.
Variables used - avg_rating, hhincome
Plot 2.
Variables used - avg_rating, hhincome, county
Plot 3.
Variables used - review_count_log, hhincome, county, pct_white
Plot 4.
Variables used - pct_pov_log, hhincome, pct_white, race.tot,
review_count_log, county
Hint: I used pivot_longer() to create Plot
4.
Logistics
- Write your report and R code in a R Markdown document.
- Use the Knit button in R Studio to render it as a HTML document.
- Publish the rendered document to RPubs.
- Submit the URL of the RPubs document through Canvas by 10/10/2022 Monday midnight (11:59 PM on Monday).
- To submit, go to Canvas > Assignments > Mini assignment 4.
- Grades and feedback will be posted through Canvas.
Notes
Try Knitting your R Markdown document and publishing it on RPubs early. This is to ensure you’d have time to troubleshoot if you encounter technical issues.
You can always replace a published document (i.e., republish) or delete the existing one and publish it as a new document.